gini impurity
- North America > United States (0.14)
- Asia > China > Jiangsu Province > Nanjing (0.04)
On the Gini-impurity Preservation For Privacy Random Forests
Random forests have been one successful ensemble algorithms in machine learning. Various techniques have been utilized to preserve the privacy of random forests from anonymization, differential privacy, homomorphic encryption, etc., whereas it rarely takes into account some crucial ingredients of learning algorithm. This work presents a new encryption to preserve data's Gini impurity, which plays a crucial role during the construction of random forests. Our basic idea is to modify the structure of binary search tree to store several examples in each node, and encrypt data features by incorporating label and order information. Theoretically, we prove that our scheme preserves the minimum Gini impurity in ciphertexts without decrypting, and present the security guarantee for encryption. For random forests, we encrypt data features based on our Gini-impurity-preserving scheme, and take the homomorphic encryption scheme CKKS to encrypt data labels due to their importance and privacy. We conduct extensive experiments to show the effectiveness, efficiency and security of our proposed method.
ACT: Agentic Classification Tree
Grari, Vincent, Arni, Tim, Laugel, Thibault, Lamprier, Sylvain, Zou, James, Detyniecki, Marcin
When used in high-stakes settings, AI systems are expected to produce decisions that are transparent, interpretable, and auditable, a requirement increasingly expected by regulations. Decision trees such as CART provide clear and verifiable rules, but they are restricted to structured tabular data and cannot operate directly on unstructured inputs such as text. In practice, large language models (LLMs) are widely used for such data, yet prompting strategies such as chain-of-thought or prompt optimization still rely on free-form reasoning, limiting their ability to ensure trustworthy behaviors. We present the Agentic Classification Tree (ACT), which extends decision-tree methodology to unstructured inputs by formulating each split as a natural-language question, refined through impurity-based evaluation and LLM feedback via TextGrad. Experiments on text benchmarks show that ACT matches or surpasses prompting-based baselines while producing transparent and interpretable decision paths.
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Europe > Poland > Masovia Province > Warsaw (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Law (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Consumer Health (0.68)
- North America > United States (0.14)
- Asia > China > Jiangsu Province > Nanjing (0.04)
On the Gini-impurity Preservation For Privacy Random Forests
Random forests have been one successful ensemble algorithms in machine learning. Various techniques have been utilized to preserve the privacy of random forests from anonymization, differential privacy, homomorphic encryption, etc., whereas it rarely takes into account some crucial ingredients of learning algorithm. This work presents a new encryption to preserve data's Gini impurity, which plays a crucial role during the construction of random forests. Our basic idea is to modify the structure of binary search tree to store several examples in each node, and encrypt data features by incorporating label and order information. Theoretically, we prove that our scheme preserves the minimum Gini impurity in ciphertexts without decrypting, and present the security guarantee for encryption.
Causal Discovery and Classification Using Lempel-Ziv Complexity
Dhruthi, null, Nagaraj, Nithin, B, Harikrishnan N
Inferring causal relationships in the decision-making processes of machine learning algorithms is a crucial step toward achieving explainable Artificial Intelligence (AI). In this research, we introduce a novel causality measure and a distance metric derived from Lempel-Ziv (LZ) complexity. We explore how the proposed causality measure can be used in decision trees by enabling splits based on features that most strongly \textit{cause} the outcome. We further evaluate the effectiveness of the causality-based decision tree and the distance-based decision tree in comparison to a traditional decision tree using Gini impurity. While the proposed methods demonstrate comparable classification performance overall, the causality-based decision tree significantly outperforms both the distance-based decision tree and the Gini-based decision tree on datasets generated from causal models. This result indicates that the proposed approach can capture insights beyond those of classical decision trees, especially in causally structured data. Based on the features used in the LZ causal measure based decision tree, we introduce a causal strength for each features in the dataset so as to infer the predominant causal variables for the occurrence of the outcome.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.05)
- North America > United States > Wisconsin (0.05)
- Asia > India > Goa (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
GPTree: Towards Explainable Decision-Making via LLM-powered Decision Trees
Xiong, Sichao, Ihlamur, Yigit, Alican, Fuat, Yin, Aaron Ontoyin
Traditional decision tree algorithms are explainable but struggle with non-linear, high-dimensional data, limiting its applicability in complex decision-making. Neural networks excel at capturing complex patterns but sacrifice explainability in the process. In this work, we present GPTree, a novel framework combining explainability of decision trees with the advanced reasoning capabilities of LLMs. GPTree eliminates the need for feature engineering and prompt chaining, requiring only a task-specific prompt and leveraging a tree-based structure to dynamically split samples. We also introduce an expert-in-the-loop feedback mechanism to further enhance performance by enabling human intervention to refine and rebuild decision paths, emphasizing the harmony between human expertise and machine intelligence. Our decision tree achieved a 7.8% precision rate for identifying "unicorn" startups at the inception stage of a startup, surpassing gpt-4o with few-shot learning as well as the best human decision-makers (3.1% to 5.6%).
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)
Ents: An Efficient Three-party Training Framework for Decision Trees by Communication Optimization
Lin, Guopeng, Han, Weili, Ruan, Wenqiang, Zhou, Ruisheng, Song, Lushan, Li, Bingshuai, Shao, Yunfeng
Multi-party training frameworks for decision trees based on secure multi-party computation enable multiple parties to train high-performance models on distributed private data with privacy preservation. The training process essentially involves frequent dataset splitting according to the splitting criterion (e.g. Gini impurity). However, existing multi-party training frameworks for decision trees demonstrate communication inefficiency due to the following issues: (1) They suffer from huge communication overhead in securely splitting a dataset with continuous attributes. (2) They suffer from huge communication overhead due to performing almost all the computations on a large ring to accommodate the secure computations for the splitting criterion. In this paper, we are motivated to present an efficient three-party training framework, namely Ents, for decision trees by communication optimization. For the first issue, we present a series of training protocols based on the secure radix sort protocols to efficiently and securely split a dataset with continuous attributes. For the second issue, we propose an efficient share conversion protocol to convert shares between a small ring and a large ring to reduce the communication overhead incurred by performing almost all the computations on a large ring. Experimental results from eight widely used datasets show that Ents outperforms state-of-the-art frameworks by $5.5\times \sim 9.3\times$ in communication sizes and $3.9\times \sim 5.3\times$ in communication rounds. In terms of training time, Ents yields an improvement of $3.5\times \sim 6.7\times$. To demonstrate its practicality, Ents requires less than three hours to securely train a decision tree on a widely used real-world dataset (Skin Segmentation) with more than 245,000 samples in the WAN setting.
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.05)
- Asia > China (0.04)
- North America > United States > Washington > King County > Redmond (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
Hacking a surrogate model approach to XAI
Wilhelm, Alexander, Zweig, Katharina A.
In recent years, the number of new applications for highly complex AI systems has risen significantly. Algorithmic decision-making systems (ADMs) are one of such applications, where an AI system replaces the decision-making process of a human expert. As one approach to ensure fairness and transparency of such systems, explainable AI (XAI) has become more important. One variant to achieve explainability are surrogate models, i.e., the idea to train a new simpler machine learning model based on the input-output-relationship of a black box model. The simpler machine learning model could, for example, be a decision tree, which is thought to be intuitively understandable by humans. However, there is not much insight into how well the surrogate model approximates the black box. Our main assumption is that a good surrogate model approach should be able to bring such a discriminating behavior to the attention of humans; prior to our research we assumed that a surrogate decision tree would identify such a pattern on one of its first levels. However, in this article we show that even if the discriminated subgroup - while otherwise being the same in all categories - does not get a single positive decision from the black box ADM system, the corresponding question of group membership can be pushed down onto a level as low as wanted by the operator of the system. We then generalize this finding to pinpoint the exact level of the tree on which the discriminating question is asked and show that in a more realistic scenario, where discrimination only occurs to some fraction of the disadvantaged group, it is even more feasible to hide such discrimination. Our approach can be generalized easily to other surrogate models.
- North America > United States > New York (0.04)
- Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
Smooth Sensitivity for Learning Differentially-Private yet Accurate Rule Lists
Ly, Timothée, Ferry, Julien, Huguet, Marie-José, Gambs, Sébastien, Aivodji, Ulrich
Differentially-private (DP) mechanisms can be embedded into the design of a machine learningalgorithm to protect the resulting model against privacy leakage, although this often comes with asignificant loss of accuracy. In this paper, we aim at improving this trade-off for rule lists modelsby establishing the smooth sensitivity of the Gini impurity and leveraging it to propose a DP greedyrule list algorithm. In particular, our theoretical analysis and experimental results demonstrate thatthe DP rule lists models integrating smooth sensitivity have higher accuracy that those using otherDP frameworks based on global sensitivity.
- North America > Canada > Quebec > Montreal (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Information Technology > Security & Privacy (1.00)
- Law (0.68)